Skip to content

fix(cli): gen2-migration lock --rollback reports false drift#14862

Open
iliapolo wants to merge 44 commits into
devfrom
epolon/e2e-rollback
Open

fix(cli): gen2-migration lock --rollback reports false drift#14862
iliapolo wants to merge 44 commits into
devfrom
epolon/e2e-rollback

Conversation

@iliapolo
Copy link
Copy Markdown
Contributor

@iliapolo iliapolo commented May 6, 2026

Description of changes

The lock --rollback command was failing because template drift detection reported false positives. For example, the RoleName property of an AWS::Cognito::UserPoolGroup resource changes during refactor, switching from the Gen1 role to the Gen2 role. Running drift detection after a successful rollback of refactor would still report this drift, even though it is expected and harmless.

This PR replaces the drift-based validation with a simpler, more reliable approach: per-resource stack integrity checks that verify expected resources still exist in the deployed stacks. If resources are missing, it means refactor --rollback wasn't run first. Here is an example of a failed validation report if lock --rollback is executed before refactor --rollback:

⚠️  WARNING: AWS Amplify Gen 1 CLI is in maintenance mode and will reach end of life on May 1, 2027.
During maintenance mode, only critical bug fixes and security patches will be provided.
Migrate to Amplify Gen 2: https://docs.amplify.aws/react/start/migrate-to-gen2/


→ Planning complete
→ Validating complete

Failed Validations Report

✘ Stack Integrity: amplify-storelocat2605052001-txuuzhyxdw-70d3e-authuserPoolGroups-1SJ18NH8VYGRJ

Following resources are missing. Did you forget to run 'amplify gen2-migration refactor --rollback'?

┌────────────────────────┬─────────────────────────────┐
│ Logical ID             │ Type                        │
├────────────────────────┼─────────────────────────────┤
│ storeLocatorAdminGroup │ AWS::Cognito::UserPoolGroup │
└────────────────────────┴─────────────────────────────┘

✘ Stack Integrity: amplify-storelocat2605052001-txuuzhyxdw-70d3e-authstorelocator41a9495f41a9495f-1M8SYEQAC0F2S

Following resources are missing. Did you forget to run 'amplify gen2-migration refactor --rollback'?

┌─────────────────────┬──────────────────────────────────────────┐
│ Logical ID          │ Type                                     │
├─────────────────────┼──────────────────────────────────────────┤
│ UserPool            │ AWS::Cognito::UserPool                   │
├─────────────────────┼──────────────────────────────────────────┤
│ UserPoolClientWeb   │ AWS::Cognito::UserPoolClient             │
├─────────────────────┼──────────────────────────────────────────┤
│ UserPoolClient      │ AWS::Cognito::UserPoolClient             │
├─────────────────────┼──────────────────────────────────────────┤
│ IdentityPool        │ AWS::Cognito::IdentityPool               │
├─────────────────────┼──────────────────────────────────────────┤
│ IdentityPoolRoleMap │ AWS::Cognito::IdentityPoolRoleAttachment │
└─────────────────────┴──────────────────────────────────────────┘

Validations Summary

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────┬──────────┐
│ Validation                                                                                                    │ Status   │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ Stack Integrity: amplify-storelocat2605052001-txuuzhyxdw-70d3e-authuserPoolGroups-1SJ18NH8VYGRJ               │ ✘ Failed │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ Stack Integrity: amplify-storelocat2605052001-txuuzhyxdw-70d3e-authstorelocator41a9495f41a9495f-1M8SYEQAC0F2S │ ✘ Failed │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────┘

🛑 Validations failed

E2E rollback flow

Extended the e2e migration flow to exercise the full round-trip: after the forward migration completes and gen2 tests pass, the flow now runs refactor --rollbacklock --rollbackpush → gen1 tests. This validates that rollback actually works end-to-end.

userAttributes in defineAuth

In #14841, we started rendering attributes returned from the SDK description of the user pool schema. This turned out to be wrong because the response contains all possible attributes, not only the ones that were actually configured. This means that post-refactor the user pool is trying to undergo an illegal update and fails.

This PR reverts the logic back to only render required attributes, which is what userAttributes seems to be dedicated for and is the Gen1 user actually set.

Make auth cognito refactorers idempotent for social auth resources

Before: Running rollback a second time after a successful first run would unconditionally orphan the Gen2 social auth resources (identity providers, user pool domain) again, even though the holding stack had already been torn down. This left those resources detached from any stack with no way to recover them.

After: The rollback refactorer now checks whether the holding stack still exists before attempting the orphan step. If it's already gone (i.e., a previous rollback succeeded), the operation is skipped entirely, making rollback safe to re-run without risking resource loss. The same guard was added to the forward path symmetrically. Additionally, the orphan table now displays physical resource IDs so operators can verify exactly which deployed resources are affected.

Issue #, if available

N/A

Description of how you validated changes

  • Unit tests updated and passing (lock.test.ts reduced from 26 to 11 focused tests)
  • E2E validation pending via yarn cloud-e2e

Checklist

  • PR description included
  • yarn test passes
  • Tests are changed or added
  • Relevant documentation is changed or added (and PR referenced)
  • New AWS SDK calls or CloudFormation actions have been added to relevant test and service IAM policies
  • Pull request labels are added

iliapolo added 2 commits May 5, 2026 21:45
…integrity checks

Template drift detection was reporting false positives during lock
--rollback, blocking the rollback from completing. Replace it with
per-resource stack integrity validation that checks whether expected
resources still exist in the deployed stacks.

Also renames config keys from lock/refactor to lockForward/refactorForward
to distinguish forward and rollback directions, adds rollback steps to
the e2e migration flow, and extends cfn-output-resolver to fall back to
physical resource ID when resolving Ref.
---
Prompt: commit my changes. no tests. just commit.
Restore the original curated resource type list for DeletionPolicy
instead of applying Retain to all non-stack resources. Update lock
rollback tests and post-refactor snapshots accordingly.
---
Prompt: made more changes. commit.
@iliapolo iliapolo changed the title Epolon/e2e rollback fix(cli): gen2-migration lock --rollback reports false drift May 6, 2026
@iliapolo iliapolo marked this pull request as ready for review May 6, 2026 17:23
@iliapolo iliapolo requested a review from a team as a code owner May 6, 2026 17:23
iliapolo added 13 commits May 6, 2026 17:19
…pping

The gen2 userAttributes property only maps to required attributes.
Optional attributes were incorrectly included, causing mismatches
with the gen2 auth construct type definition.

Also inlines the placeholder app name in cleanup-codebuild-resources
to remove the import dependency on amplify-e2e-core.
---
Prompt: commit my changes. no tests. no build. just commit.
Comment thread amplify-migration-apps/mood-board/tests/api.test.ts
Comment thread amplify-migration-apps/mood-board/tests/api.test.ts
iliapolo added 8 commits May 7, 2026 20:29
… class

Move teardown and error-handling logic from cli.ts and
run-migration-e2e.ts into a new App.e2e() method. Extract rollback
into its own public method. Update READMEs for both packages.

---
Prompt: commit all my changes as "chore"
Document all supported step keys (lockForward, lockRollback,
refactorForward, refactorRollback, generate) and both StepConfig
fields (skipValidations, skip) in the migration apps README.

---
Prompt: commit
await this.refactor(gen2StackName);

// twice for idempotancy
await this.refactorForward(gen2StackName);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add log banner to indicate execution (1) and (2)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

await this.pull();

// twice for idempotancy
await this.refactorRollback(gen2StackName);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

const nestedStacks = await this.listNestedStack(this.gen1App.rootStackName);
for (const resource of this.gen1App.discover()) {
switch (resource.key) {
case 'auth:Cognito':
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if there is an existing function (getTemplatePathForResource or something) in the cli codebase.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response from Kiro:

Nothing like this exists today. The inconsistency in Amplify's template naming is the reason nobody abstracted it — it's not a clean formula. If you want to DRY it up, you'd need to introduce one yourself. 

I have no trouble believing that :)

iliapolo added 18 commits May 8, 2026 15:03
Replace 4 hardcoded box-drawing console.log banners with a reusable
printBanner(text) function that dynamically centers the given text
inside a fixed-width border. Also removes a duplicate comment in the
rollback method.

---
Prompt: in app.ts, refactor the hardcoded box-drawing console.log
banner to a function that accepts a string, with alignment that
adjusts automatically.
…banner alignment

Extract a new public forward() method from migrate() to separate the
forward refactor + redeploy phase into its own lifecycle step.
Restructure e2e() to call migrate(), rollback(), forward(), and retain()
as distinct banner-delimited phases for clearer log output.

Fix printBanner alignment bug where the border used width=60 but
content lines used innerWidth=58, causing a 2-char mismatch.

---
Prompt: commit my changes (alignment fix + forward method extraction)
Replace the execa shell-out to `aws sts get-caller-identity` in
bootstrapCDK with a direct STSClient SDK call. This removes the
dependency on the AWS CLI binary being available at runtime and
uses the same credential resolution (fromIni) as the rest of the
codebase.

Also add extendEnv: false to the cdk bootstrap execa call for
consistency with other subprocess invocations, and bump the
migration e2e test timeout from 2h to 3h to account for the full
rollback + forward + retain lifecycle.
---
Prompt: replace execa aws sts get-caller-identity with SDK calls
Add post-rollback lifecycle hook to the E2E system and
corresponding scripts for all migration apps. The post-rollback
script reverses the edits applied by post-refactor (e.g.,
re-commenting the postRefactor() call in amplify/backend.ts).

Changes:
- Add postRollback() method to App class and wire it into the
  rollback flow in e2e()
- Remove the throw that was blocking the rollback code path
- Add printStepBanner for sub-step visual separation
- Fix printPhaseBanner to use Unicode box-drawing escapes
- Add post-rollback.ts to all 9 apps (mood-board also reverts
  the Kinesis stream name)
- Add "post-rollback" npm script to all app package.json files
- Document post-rollback.ts in the migration apps README
- Fix store-locator auth test to ignore LimitExceededException
  for daily email limit errors
- Include auth-cognito forward/rollback holding stack changes
- Include credentials and shared-scripts fixes
---
Prompt: In the store locator gen2-migration app, add a
post-rollback script that comments back the thing that
uncommented during post-refactor. Then add to all apps.
…rollback

Add tests verifying that the forward and rollback refactorers
correctly handle the presence of a holding stack when deciding
whether to produce orphan/import operations for social auth
resources.

Forward:
- beforeMove skips orphan when holding stack exists
- beforeMove includes orphan when no holding stack
- move skips import when holding stack exists
- move includes import when no holding stack

Rollback:
- move includes orphan when holding stack exists
- move skips orphan when no holding stack
- afterMove includes import when holding stack exists
- afterMove skips import when no holding stack
---
Prompt: In auth cognito forward and rollback tests, add test
cases for our changes. We need to make sure that both take into
account the existence of the holding stack when creating
import/orphan operations.
… query limit

Add consistentRead: true to the finance-tracker custom VTL
resolver template to avoid stale reads in the E2E test.
Increase the query limit from 10 to 1000 to ensure all test
records are returned.
---
Prompt: made some changes to finance tracker app, commit.
…ion tests

Document that gen2 migration E2E tests take around two hours
per app due to running the full workflow twice (forward +
rollback), and explain why the standard retry function cannot
be used (token expiry).
---
Prompt: In the e2e tests readme, under Gen2 Migration E2E
Tests, add a note about retry and duration.
Move the CodeBuild debug flag into the args array before
constructing the command string, so the logged command
accurately reflects what is executed.
---
Prompt: made a change, commit.
…ests

AppSync resolvers apply a default scan limit of 100 items. As test
tables accumulate items across runs, list queries may not return the
newly created item in the first page, causing sporadic
"expect(found).toBeDefined()" failures.

Add `limit: 1000` to all list query calls that are followed by an
`items.find()` assertion across all 6 affected migration apps:
discussions, imported-resources, media-vault, mood-board,
product-catalog, and project-boards.

---
Prompt: Im seeing sporadic failures in the "lists topics including
a newly created one" in the discussions gen2-migration app. the
error is that "expect(found).toBeDefined();" returns undefined. I
think it has to do with Amplify resolvers applying a default limit
of 100 for scan operations, and the tests might be writing more
than a 100 items prior to this specific test executed. look at all
our apps and their tests, are there any other such problems? fix
them all.
The --debug flag produces excessive output during amplify push and is
not needed for normal e2e test runs.

---
Prompt: made changes, commit
@iliapolo iliapolo enabled auto-merge (squash) May 11, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant